46 research outputs found
Reinforcement Learning for the Unit Commitment Problem
In this work we solve the day-ahead unit commitment (UC) problem, by
formulating it as a Markov decision process (MDP) and finding a low-cost policy
for generation scheduling. We present two reinforcement learning algorithms,
and devise a third one. We compare our results to previous work that uses
simulated annealing (SA), and show a 27% improvement in operation costs, with
running time of 2.5 minutes (compared to 2.5 hours of existing
state-of-the-art).Comment: Accepted and presented in IEEE PES PowerTech, Eindhoven 2015, paper
ID 46273
Chance-Constrained Outage Scheduling using a Machine Learning Proxy
Outage scheduling aims at defining, over a horizon of several months to
years, when different components needing maintenance should be taken out of
operation. Its objective is to minimize operation-cost expectation while
satisfying reliability-related constraints. We propose a distributed
scenario-based chance-constrained optimization formulation for this problem. To
tackle tractability issues arising in large networks, we use machine learning
to build a proxy for predicting outcomes of power system operation processes in
this context. On the IEEE-RTS79 and IEEE-RTS96 networks, our solution obtains
cheaper and more reliable plans than other candidates
Beyond the One Step Greedy Approach in Reinforcement Learning
The famous Policy Iteration algorithm alternates between policy improvement
and policy evaluation. Implementations of this algorithm with several variants
of the latter evaluation stage, e.g, -step and trace-based returns, have
been analyzed in previous works. However, the case of multiple-step lookahead
policy improvement, despite the recent increase in empirical evidence of its
strength, has to our knowledge not been carefully analyzed yet. In this work,
we introduce the first such analysis. Namely, we formulate variants of
multiple-step policy improvement, derive new algorithms using these definitions
and prove their convergence. Moreover, we show that recent prominent
Reinforcement Learning algorithms are, in fact, instances of our framework. We
thus shed light on their empirical success and give a recipe for deriving new
algorithms for future study.Comment: ICML 201
Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning
Multiple-step lookahead policies have demonstrated high empirical competence
in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model
Predictive Control. In a recent work \cite{efroni2018beyond}, multiple-step
greedy policies and their use in vanilla Policy Iteration algorithms were
proposed and analyzed. In this work, we study multiple-step greedy algorithms
in more practical setups. We begin by highlighting a counter-intuitive
difficulty, arising with soft-policy updates: even in the absence of
approximations, and contrary to the 1-step-greedy case, monotonic policy
improvement is not guaranteed unless the update stepsize is sufficiently large.
Taking particular care about this difficulty, we formulate and analyze online
and approximate algorithms that use such a multi-step greedy operator.Comment: NIPS 201
Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs
As communication protocols evolve, datacenter network utilization increases.
As a result, congestion is more frequent, causing higher latency and packet
loss. Combined with the increasing complexity of workloads, manual design of
congestion control (CC) algorithms becomes extremely difficult. This calls for
the development of AI approaches to replace the human effort. Unfortunately, it
is currently not possible to deploy AI models on network devices due to their
limited computational capabilities. Here, we offer a solution to this problem
by building a computationally-light solution based on a recent reinforcement
learning CC algorithm [arXiv:2207.02295]. We reduce the inference time of RL-CC
by x500 by distilling its complex neural network into decision trees. This
transformation enables real-time inference within the -sec decision-time
requirement, with a negligible effect on quality. We deploy the transformed
policy on NVIDIA NICs in a live cluster. Compared to popular CC algorithms used
in production, RL-CC is the only method that performs well on all benchmarks
tested over a large range of number of flows. It balances multiple metrics
simultaneously: bandwidth, latency, and packet drops. These results suggest
that data-driven methods for CC are feasible, challenging the prior belief that
handcrafted heuristics are necessary to achieve optimal performance
A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound
Policy evaluation in reinforcement learning is often conducted using
two-timescale stochastic approximation, which results in various gradient
temporal difference methods such as GTD(0), GTD2, and TDC. Here, we provide
convergence rate bounds for this suite of algorithms. Algorithms such as these
have two iterates, and which are updated using two distinct
stepsize sequences, and respectively. Assuming and with we
show that, with high probability, the two iterates converge to their respective
solutions and at rates given by and here,
hides logarithmic terms. Via comparable lower bounds, we show that
these bounds are, in fact, tight. To the best of our knowledge, ours is the
first finite-time analysis which achieves these rates. While it was known that
the two timescale components decouple asymptotically, our results depict this
phenomenon more explicitly by showing that it in fact happens from some finite
time onwards. Lastly, compared to existing works, our result applies to a
broader family of stepsizes, including non-square summable ones